Lingo: Search Results Clustering Algorithm Based on Singular Value Decomposition

نویسندگان

  • Stanislaw Osinski
  • Jerzy Stefanowski
  • Dawid Weiss
چکیده

Search results clustering problem is defined as an automatic, on-line grouping of similar documents in a search results list returned from a search engine. In this paper we present Lingo—a novel algorithm for clustering search results, which emphasizes cluster description quality. We describe methods used in the algorithm: algebraic transformations of the term-document matrix and frequent phrase extraction using suffix arrays. Finally, we discuss results acquired from an empirical evaluation of the algorithm. Knowledge is of two kinds: we know a subject ourselves, or we know where we can find information about it. — Samuel Johnson, 1775

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Graph Clustering by Hierarchical Singular Value Decomposition with Selectable Range for Number of Clusters Members

Graphs have so many applications in real world problems. When we deal with huge volume of data, analyzing data is difficult or sometimes impossible. In big data problems, clustering data is a useful tool for data analysis. Singular value decomposition(SVD) is one of the best algorithms for clustering graph but we do not have any choice to select the number of clusters and the number of members ...

متن کامل

A Dimensionless Parameter Approach based on Singular Value Decomposition and Evolutionary Algorithm for Prediction of Carbamazepine Particles Size

The particle size control of drug is one of the most important factors affecting the efficiency of the nano-drug production in confined liquid impinging jets. In the present research, for this investigation the confined liquid impinging jet was used to produce nanoparticles of Carbamazepine. The effects of several parameters such as concentration, solution and anti-solvent flow rate and solvent...

متن کامل

Test of BibTEX references

[1] J. Stefanowski and D. Weiss, “Carrot2 and language properties in web search results clustering,” in Proceedings of AWIC-2003, First International Atlantic Web Intelligence Conference, ser. Lecture Notes in Computer Science, E. M. Ruiz, J. Segovia, and P. S. Szczepaniak, Eds., vol. 2663. Madrid, Spain: Springer, 2003, pp. 240–249. [Online]. Available: http://www.cs.put.poznan.pl/dweiss/xml/ ...

متن کامل

Noise Effects on Modal Parameters Extraction of Horizontal Tailplane by Singular Value Decomposition Method Based on Output Only Modal Analysis

According to the great importance of safety in aerospace industries, identification of dynamic parameters of related equipment by experimental tests in operating conditions has been in focus. Due to the existence of noise sources in these conditions the probability of fault occurrence may increases. This study investigates the effects of noise in the process of modal parameters identification b...

متن کامل

Semantic, Hierarchical, Online Clustering of Web Search Results

Today, search engine is the most commonly used tool for Web information retrieval, however, its current status is still far from satisfaction. This paper focuses on clustering Web search results in order to help users find relevant Web information more easily and quickly. The main contributions of this paper include the following. (1) The benefits of using key phrases as natural language inform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004